entropy rate estimation
Reviews: Entropy Rate Estimation for Markov Chains with Large State Space
The paper proposes an entropy estimate for Markov chains by reduction to optimal entropy estimation for i.i.d samples. Sample complexity analysis is provided for different mixing scenarios with a minimax rate established for a particular rate. The estimator is used to assess the capacity of language models. This is a very clear and well-written paper. I appreciate the efforts done by the authors to summarize the results.
Entropy Rate Estimation for Markov Chains with Large State Space
Han, Yanjun, Jiao, Jiantao, Lee, Chuan-Zheng, Weissman, Tsachy, Wu, Yihong, Yu, Tiancheng
Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. In comparison, the empirical entropy rate requires at least $\Omega(S 2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.